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ABSTRACT 

Aims. Estimating the marginal likelihoods is an essential feature of model selection in the Bayesian context. It is especially crucial 
to have good estimates when assessing the number of planets orbiting stars and the different models explain the noisy data with 
different numbers of Keplerian signals. We introduce a simple method for approximating the marginal likelihoods in practice when a 
statistically representative sample from the parameter posterior density is available. 

Methods. We use our truncated posterior mixture estimate to receive accurate model probabilities for models with differing number 
of Keplerian signals in radial velocity data. We test this estimate in simple scenarios to assess its accuracy and rate of convergence 
in practice when the corresponding estimates calculated using deviance information criterion can be applied to receive trustworthy 
results for reliable comparison. As a test case, we determine the posterior probability of a planet orbiting HD 365 1 given Lick and 
Keck radial velocity data. 

Results. The posterior mixture estimate appears to be a simple and an accurate way of calculating marginal integrals from posterior 
samples. We show, that it can be used to estimate the marginal integrals reliably in practice, given a suitable selection of parameter 
A, that controls its accuracy and convergence rate. It is also more accurate than the one block Metropolis-Hastings estimate and can 
be used in any application because it is not based on assumptions on the nature of the posterior density nor the amount of data or 
parameters in the statistical model. 

Key words. Methods: Statistical, Numerical - Techniques: Radial velocities - Stars: Individual: HD 3651 



1. Introduction 

The selection between a collection of candidate models is of 
significant in all fields of astronomy but especially so, when 
the purpose is to extract weak planetary signals from noisy 
data. The ability to tell whether a signal is present in data as 
reliably as possible is essential in several searches for low- 
mass exoplanets orbiting nearby stars, whether made using the 
Doppler spe ctroscopy method; e . g. the Anglo - Austr alian Planet 
Search (e.g. Tinn ev et al.L 1200 ll 'Jones et al.L 120021 and refer- 
ences therein), Hi gh-Accu rac y Radial V elocity Planet Searcher 



. itv 

(e.g. iMavor et al.i 120031; iLovis et al.[ 1201 U and references 
therein), Hic h Resolution Echelle Spectrometer (e.g. lVogt et al.L 
11994 1201 Ol and references therein); by searching photometric 
transits; e.g. Convection Rotation and Planetary Transits (e.g. 
[Barge et al., 2007; Hebrard et al., 2 01 ll and references ther ein), 
WASP (e.g. rCollier Cameron et al.Ll2'007tlFaedi et al.Ll20Tll and 
references thr erein); or other possible techniques, such as as- 
ti-ometry (e.g. iBenedict et all 120 02; Pravdo & Shaklan, l2009t) 



and transit timing (e.g. iHolman & Murravt i2005l) or other cur- 
rent or future methods. 

Using Bayesian tools, it is possible to determine the rela- 
tive probabilities for each statistical model in some selected col- 
lection of models to assess their relative performance, or rela- 
tive ability to explain the data in a probabilistic manner. This 
is also important in the context of being able to assess their in- 
abiUty t o explain several da ta sets in terms of the model inade- 
quacy of iTuomi et al.l (1201 lb . Especially, when different statisti- 
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cal models contain different numbers of planets orbiting the tar- 
get star, assessing their relative posterior probabilities given the 
measuremen ts is ex tremely important to detect all the signals in 
the data (e.g.| Gregorv, 2005, 2007a b; Tuomi & Kotiranta, 20091 
Tuomi, 2012) and to avoid the detection of false positives (e.g. 
Bean et al...2010 ; Tuomi, 2011). However, determining the pos- 
terior probabilities require the ability to calculate marginal in- 
tegrals that are complicated multidimensional integrals of likeli- 
hood functions and priors over the whole parameter space. While 
there are several methods of estimating the values of these inte- 
grals, those that are computationally simple and easy to imple- 
ment are more often than not the poorest one s with respect to 
their accuracy and convergence properties (e.g. Kass & Raftervl 
1995; Clyde et al., 2007; Ford & Gregory, 2007). There are also 
more complicated methods for estimating multidimensional in- 
tegrals but they may provide more difficult computational prob- 
lems themselves than typical data analyses are, which makes it 
difficult to use them in practice. 

Because of these difficulties and the need to be able to assess 
the marginal integrals reliably, we introduce a simple method 
for estimating the marginal integrals in practice if a statistically 
representative sample from the parameter posterior density ex- 
ists. As such a sample is usually calculated when assessing the 
posterior densities of model parameters using posterior sam- 
plin g algorithms (e .g. iMetropohs et a n il953UHastingsl 119701: 
Ha ario et al.L 1200 Ih . the ability to use the very same sample in 
determining the marginal integral is extremely useful in practice. 
There are methods for taking advantage of the poster ior sample 
in this manner (e.g . Newton & Raftery, 1994; Kass & Raftervl 
119951: IChib & JeUazkov^ ,2001.: ,Clvde et al.. ,2007) but their per- 
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formance, despite som e studies (e.g. iKass & Raftervl Il995t 
iFord & Gregorvl l2007l) . is not generally well known, especially 
so in astronomical problems, and some of them may also require 
samplings from other densities simultaneously, such as the prior 
density or the proposal density of the Metropolis-Hastings (M- 
H) output, making their application difficult. 

In this article, we introduce a simple method that can be used 
to receive accurate estimates for the marginal integral. We test 
our estimate, called the truncated posterior mixture (TPM) esti- 
mate in scenarios where the marginal integral can be calculated 
accurately using simpl e existing methods. The d eviance infor- 
mation criterion (DlC, ISpiegelhalter et all, |2002|) is asymptoti- 
cally an accurate estimate when the sample size, i.e. the sample 
drawn from the posterior density, increases and can be used if the 
posterior is a multivariate Gaussian. Therefore, we compare our 
estimate with the DlC estimate in such cases to test its accuracy 
in practice. If accurate, our estimate is applicable whenever a 
statistically representative sample from the posterior is available 
because we do not make any assumptions regarding the shape of 
the posterior density when deriving the TPM estimate. The only 
assumptions are, that such a sample exists and it is statistically 
representative. We also calculate the marginal likelihoods using 
the s imple Akaike information criterion (AlC) fo r small sample 
size (lAkaikel[T97llBurnham & Andersonll2002h . the harmonic 
mean (HM) estimate that is a special case of the TPM with 
poor convergence properties, and the One Block Me tropolis- 
Hastings (OBMH) method of IChib & Jeliazkovl (1200 ll) that re- 
quires the_sirnujtane^ o f posterior a n d prop osal den- 
sities. iKass & Raftervl d 19951) and IClvde et al.1 ilOOlti give de- 
tailed summaries of different methods in the context of model 
selection problems. 

Finally, we also test the performance of the TPM estimate 
and the effects of prior choice in simple cases where it is pos- 
sible to calculate the marginal integral from a sample from the 
prior (with the common mean estimate) and/or using direct nu- 
merical integration. Especially, we show the undesirable effects 
of Bartlett's paradox on the marginal integrals and demonstrate 
that the TPM estimate actually circumvents these effects in prac- 
tice. 

2. Estimating marginal integrals 

In the Bayesian context, the models in some a priori selected 
collection can be equipped with relative numbers representing 
the probabilities of having observed the data m if the model was 
a correct one. Therefore, for k different models M\ , Ait, these 
probabilities are calculated as 



P(Mi\m) 



P{m\MdP(Md 
i:%,P(m\Mj)P(Mjy 



(1) 



where PiMd are the prior probabilities of the different models 
and the marginal integrals, sometimes called the marginal likeli- 
hoods, are defined as 



P{m\Mi)= l{m\6i,Mi)M0i\Mi)dei 



(2) 



and / denotes the likelihood function and 7r(0|Al,) is the prior 
density of the parameters. 

The truncated posterior mixture estimate that approximates 
the marginal integral is defined as (see Appendix) 



TPM 



(1 - A)lipi + Ali-hpi- 



(1 - A)lipi + Ali-hpi-h 



(3) 



where Z, is the value of the likelihood function at 0,, pi is the 
value of prior density at 0, , and A e [0, 1] and /i e N are parame- 
ters that control the convergence and accuracy properties of the 
estimate. While it is easy to select h - it only needs to be large 
enough such that 0, and are independent - selecting param- 
eter A is more difficult. If A is too large, the sample from the 
posterior is not close to the sample from the importance sam- 
pling function g in the Eq. ( IA.6I 1 in the Appendix, and the re- 
sulting estimate for the marginal is biased. Conversely, too small 
values of A, while making the estimate more accurate, decrease 
its convergence rate because the estimate approaches asymptot- 
ically the HM estimate that is known to have extremely poor 
convergence properties (see the Appendix and Kass & Rafteryl 
1995). Therefore, we test different values of A to find the best 
choice in applications. We note, however, that when 0, and Oi^i, 
are independent, i.e. when h is large enough given the mixing 
properties of the Markov chain used to draw a sample from the 
posterior density, the TPM can converge to the marginal inte- 
gral. The reason is that it is clear from Eq. (O that occasional 
very small values of /,-, that consequently have a large impact on 
the sums in the estimate, do not slow down the convergence as 
much as they would in the HM estimate because it is unlikely 
that is also small at the same time. This is the key feature in 
the TPM estimate that ensures its relatively rapid convergence in 
practice. 

We estimate the integral in Eq. (|2|i using five methods. The 
HM estimate (see Appendix), the truncated posterior mixture es- 
timate introduced here, t he DlC, AlC, and the OBMH method of 
IChib&.Teliazkovl(l2001h . While the DlC is a reasonably practi- 
cal estimate in certain cases, it requires that the posterior is uni- 
modal and symmetric and can be approximated as a multivariate 
Gaussian density, which is only rarely the case in applications. It 
can be easily calculated by using the average of the likelihoods 
and the likelihood of the parameter mean, which also reveals 
why the posterior needs to be unimodal and symmetric for reli- 
able results. These means do not reflect the properties of the pos- 
terior in the cases of skewness and multiple modes, not to men- 
tion nonlinear correlations between some parameters in vector 
6. The DlC is asymptotically accurate when the sample size be- 
comes large ( Spie gelhalter et afl. 12002 1 We do not consider the 
HM estimate to be a trustworthy one but calculate its value be- 
cause it is a special case of the truncated posterior mixture esti- 
mate when /i = (or 1). The AlC could provide a reasonably ac- 
curate estimate in practice, and therefore we compare its perfor- 
mance in various scenarios. However, it relies on the maximum 
likelihood parameter estimate, and does not therefore take into 
account the prior information on the model parameters. Its ac- 
curacy also decreases as the amount of parameters in the model 
increases or the number of mea surements decreases. Fin ally, we 
calculate the OBMH estimate (IChib & Jeliazkovl l200lh . While 
this estimate appears to provide reliable results, e .g. the number 
of companions orbiting Gliese 58 1 determined in 
was supported by additional data (Forveill e et al.L 



Tuo 
201 Ih 



Idmil) 

its per- 
formance has not been studied throughly with examples. It is 
also computationally more expensive than the TPM estimate, 
and indeed the other estimates compared here, because it re- 
quires the simultaneous sampling from the proposal density of 
the M-H algorithm. 

When assessing the convergence of our TPM estimate given 
some selection of A, we say that it has converged if the estimate 
at the /th member of the Markov chain, namely Ptpm{i), satis- 
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fies \PTPM(i + k) - PTPM(i)\ < r for all A: > and some small 
number r - in accordance of the standard definition of conver- 
gence. However, in practice, we use the logarithms of Pjpm and 
a value of r = 0.1 on the logarithmic scale for simplicity. We 
also approximate the estimate as having converged if the con- 
vergence condition holds for Q < k < 10^ for practical reasons. 
While all the estimates except the AIC (which is based only on 
the maximum likelihood value) converge the better the greater 
sample they are based on, we only plot this convergence for the 
TPM estimate. For DIC, HM, and OBMH, we calculate the final 
estimate using the mean and standard deviation of values from 
several samplings. 



3. Prior effects on marginal integrals 

Because the marginal integrals in Eq. (|2]l are integrals over 
the product of likelihood function and prior probability den- 
sity of the model parameters, the choice of prior has an ef- 
fect on these integrals of different models. One such choice 
for standard model of radial velocity dat a was proposed in 
iFord & Greg ory! (l2007l) and applied in e.g. iFeroz et alJ (1201 ih 
and Gregorv 12011 ). Specifically, this prior Umits the parameter 
space of jitter amplitude a-j to [0, /To], that of reference velocity 
y to {-K(), Kq\, and that of velocity amplitude of the /th planet, K,, 
to [0, Ko(Pmi„/Piy^^], where P„„„ is the shortes t allowed period- 
icity a nd Pi is the orbital period of the ith planet. lFord & Gregory! 
(12007!) propose that the hyperparameter /To should be set to 2129 



ms , which corresponds to a maximum planet-star mass-ratio of 
0.01. 

We assume for simplicity that f - Pi, which leads to 
a constant prior for the parameter Ki. It then follows that the 
prior probability density of a A:-Keplerian model has a multi- 
plicative constant coefficient proportional to ^q^"* - this cor- 
responds to the hypervolume of the parameter space of the k- 
Keplerian model. Because this constant also scales the marginal 
integral in Eq. it can be seen that increasing A'o can make 
the posterior probability of any planetary signal insufficient to 
claim a detection, because the ratio P(m\A[k) / P{m\Aiii-\) is pro- 
portional to X'q 

The above ca n also b e descr ibed in more genera l terms. In 
fact, as noted by ' BartlettI (Il957h and iJeff^revs! Jl90), choosing 
a prior for any model with parameter such that 7T(ff} - ch{0) 
for all 6* e 0, where is the corresponding parameter space, 
can lead to undesired features with respect to model comparison 
results. Assume that this choice is made for model M\ but for 
a simpler model Mq, for which parameter 9 does not exist (the 
"null hypothesis"), this prior does not exist either because the 
corresponding parameter is not a free parameter of the model. 
Then, the posterior probability of model M\ becomes 



P(Mi\m) oc P(m\Mi)P(Mi) 

^cP(Mi) f l(m\e, Mi)h(ff)d0, 
Jee@ 

where c - 



X 



hi0)de 



(4) 



Setting the prior constant such that h(0) - 1, yields c - 
y(0) where V(@) denotes the hypervolume of the parameter 
space, and leads to the inconvenient conclusion that as the hyper- 
volume of the parameter space increases, the posterior prob- 
ability of the model M\ decreases below that of the Aio, which 
prevents the rejection of the null hypothesis regardle ss of the 
observed data m. This is called the Bartlett's paradox (iBartleti 



ll957HKass & Raftervl IT995h but it does not mean that improper 
and/or constant priors are useless nor that they should not be 
used in applications. 

A convenient way around this "paradox", can be received by 
considering the definition of the parameters. Because the analy- 
sis results should not depend on the unit system of choice, nor 
the selected parameterisation, i.e. whether we choose parameter 
or 0' = f(0), where / is an invertible (bijective) function, it 
is possible to choose the parameter system in a convenient way 
that makes c = 1 by transforming 0' - f{0) with some suitable 
/. For some choises of / the constant prior of parameter does 
not correspond to a constant prior for ff , but we do not consider 
this well-known effect of prior choice further here. 

For instance, if we apply this to a Gaussian likelihood with 
mean g{0) (e.g. a superposition of k Keplerian signals in radial 
velocity data) and variance cr^, it becomes one with mean g(f{0)) 
and variance cr^ - this does not chance the posterior density of 
the parameters as we can always make the transformation back 
using but a convenient choice of / sets c = 1 and prevents the 
prior probability density of the parameters from having undesir- 
able effects on the marginal integrals. A similar transformation 
of cr is also possible as long as the /(cr) retains the same units 
as the measurements have. Therefore, we are free to define the 
model parameters in any convenient way, using e.g. any unit sys- 
tem, and this, as long as we retain the same functional form in 
our statistical model, cannot be allowed to have an effect on the 
results of our analyses. Specifically, when analysing radial ve- 
locity data, choosing the unit system such that K' - KK^^ does 
not change the posterior density nor the values the likelihood 
function has but it makes 7t{K') - 1 for all K' e [0, 1], which 
does not result in different weights for the models with differ- 
ent numbers of planets. We demonstrate these effects further in 
Section 5 by analysing artificial data sets. 

We note, that this procedure does not interfere with the 
Occam's razor that is a built-in feature of Bayesian analysis 
methods. It still holds, that as the number of free parameters 
in the statistical model increases, this model also becomes pe- 
nalised the more heavily. The reason is, that increasing the di- 
mension of the parameter space effectively increases the hy- 
pervolume that has a reasonably high posterior probability (but 
lower than the MAP estimate) given the data - this increases the 
amount of low likelihoods in the posterior sample and in Eq. 
(O, which in turn decreases the estimated marginal integral as it 
should in accordance with the Occamian principle of parsimony. 



4. Comparison of estimates: radial velocities of HD 
3651 

To assess the performance of the TPM estimate for the marginal 
integral, we compare its performance with different selections of 
parameter A in simple cases where the marginal integral can be 
calculated reliably using the DIC, i.e. when the model param- 
eters receive close-Gaussian posteriors and the sample size is 
large. Therefore, as test cases, we choose radial velocity time- 
series made using several telescope-instrument combinations 
that have different velocity offsets and different noise levels. The 
simple model without any Keplerian signals provides a suitable 
scenario where the DIC is known to be accurate and the accuracy 
of our estimate can be assessed in practice. 

The nearby KO V dwarf HD 3651 has been reported to 
be a host to a 0.20 Mj^p exoplanet with an orbital period of 
62.23 + 0.03 days and an orbital eccentricity of 0.63 + 0.04 
(iFischer et al.Ll2003l) . The radial velocity variations of HD 3651 
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have been ob s erved using the HIRES at the Keck I telescope 
(iFischer et all 120031: iButler et all |2006") and the Shane and 
CAT telescopes a t the Lick observatory (Fischer et al., 2003; 
iButler et"an. 12006). These datasets contain measurements at 42 
and 121 epochs, respectively. The reason we chose these data is 
that they enable us to investigate several scenarios reliably. The 
fact that the planet orbiting HD 365 1 is on an eccentric orbit and 
there is plenty of data available make it possible to assess the 
accuracy of the TPM estimate in several scenarios by enabling 
the comparison to the DIC estimate that is accurate as long as 
the posterior density is Gaussian. Therefore, we investigate the 
accuracy and convergence properties of of the TPM in various 
scenarios: with high and low numbers of data compared to the 
number of model parameters, and when the marginal likelihoods 
of two models are close to each other and as far from each other 
as possible given the available data. 

We analyse the radial velocities of HD 365 1 made using the 
HIRES and Lick exoplanet surveys and calculate the marginal 
likelihoods of models with and 1 Keplerian signals using the 
methods based on DIC, AIC, TPM, HM, and OBMH. We denote 
these estimates of the integral in Eq. ^ as Pdic, Paic, Ptpm, 
Phm, and Pqbmh, respectively. We also calculate the marginal 
integrals for a very simple case of 0-Keplerian model and HIRES 
data using a sample from the prior {Pm) density and direct nu- 
merical integration {Pd)- 

4.1. Case 1. HIRES data 

The HIRES data with 42 epochs reveals some interesting differ- 
ences between the five estimates for marginal integrals. The log- 
marginal integrals are plotted in Fig. [T] as a function of Markov 
chain length. The estimated uncertainties of DIC and OBMH es- 
timates represent the standard deviations of six different Markov 
chains. The DIC estimate can be considered a reliable one in this 
case, because the posterior density is very close to a multivariate 
Gaussian. It can be seen that the AIC is biased because of the 
low number of measurements (42) compared to the number of 
parameters of the statistical model (7). Also, the OBMH estimate 
gives the 1 -planet model a greater marginal likelihood than DIC. 
However, the TPM is similarly biased for = 0.5 , 0. 1 , 1 0"^, 1 0"^ 
but converges to the DIC estimate for A - 10 10"^. The HM 
estimate is not shown in the Fig. because its extremely poor con- 
vergence properties - it receives values between -130 and -140 
on the logarithmic scale of Fig. [1] 

When using as small values of A for the TPM as possible 
such that it converges in the sense that it approaches some lim- 
iting value, we calculate the Bayes factors (B) in favour of the 
one-Keplerian model against the model without Keplerian sig- 
nals. These values are shown in Table[T] The TPM estimate con- 
verges to the same value as DIC, which is known accurate in 
this case because the posterior densities of both models are very 
close to Gaussian. However, the AIC and OBHM overestimate 
the posterior probability of the model containing a Keplerian sig- 
nal. Also, the problems of the HM estimate are clear because its 
uncertainty becomes greater than the estimated value. 

4.2. Case 2. Combined HIRES and Lick data 

Increasing the number of measurements likely makes the AIC 
yield a more accurate estimate for the marginal likelihood. 
However, to see how this affects the other estimates, we again 
compare them to the DIC which is reliable because of the 
close-Gaussianity of the posterior density. The inclusion of addi- 




N 

Fig. 1. Marginal integrals of the 1 -planet model given the HIRES 
data (case 1): DIC and its 3cr uncertainty (black dashed line and 
black dotted lines), AIC (blue dashed), OBMH and its 3o- un- 
certainty (red dashed and red dotted), and the TPM estimates 
with A = 0.5,0.1, 10-2, lQ-\ 10-4, 10-5 (black, grey, blue, pur- 
ple, pink, and red curves). 

Table 1. Bayes factors in favour of the one-Keplerian model 
given the HIRES data (case 1). 



Estimate 


B 


TPM 


1.1x10"* ± 1.2 X 10" 


DIC 


l.lxlO'-* ± 1.1 X 10" 


AIC 


3.3x10" 


OBMH 


2.8x10'^ ±5.6 X 10" 


HM 


3.3x10" ±6.5 X 10" 



tional Lick data also makes the posterior probability of the one- 
Keplerian model much greater than that of the model without 
Keplerian signals, and enables us to investigate the accuracy and 
convergence of the TMP in such a scenario. Therefore, we study 
the properties of the different estimates for marginal integrals 
using the combined HIRES and Lick data of HD 365 1 with 163 
epochs. 

TPM converges to the DIC estimate when A = 10-^ for the 
model without any Keplerians, whereas its convergence takes 
place for A = 10-^ for the one-Keplerian model (Fig. |2] pink 
curve). Clearly, the AIC is indeed closer to the DIC estimate 
because of the greater number of data but the OBHM is also 
consistent with the DIC estimate. We note that the HM estimate 
is again omitted from the Fig.|2]because it receives significantly 
lower values than the other estimates. 

Now, we calculate the Bayes factors in favour of the one- 
Keplerian model and present them in Table |2l The TPM esti- 
mate is again very close to the DIC estimate and the AIC is 
close to these, providing slightly greater support for the one- 
Keplerian model. The OBMH again overestimates the one- 
Keplerian model and the HM estimate, while this time being 
rather accurate, has an uncertainty in excess of the estimate it- 
self. Clearly, the TPM estimate can be used to receive reliable 
estimates for the marginal integral in this case as well, because 
the posterior density is again very close to Gaussian and the DIC 
estimate is therefore a reliable one in assessing the integral. 
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5x10 



4 



10^ 1.5x10^ 2x10^ 



N 



Fig. 2. As in Fig. [T] but for the combined data (case 2) and the 
TPM estimates with A = 0.1, 10"^ lQ-\ IQ-'^, lQ-\ 10"^ (black, 
grey, blue, purple, pink, and red curves). 

Table 2. Bayes factors in favour of the one-Keplerian model 
given the combined HIRES and Lick data (case 2). 



Estimate 



B 



TPM 

Die 

AIC 

OBMH 

HM 



2.0x10'' + 1.0 X 10'' 
2.2x10^'* ± 1.9 x 10" 

5.7xl03« 
2.0x10"*' ±9.0x 10^' 



1.4x10^' 



1.7 X 10^' 



Table 3. Bayes factors in favour of the one-Keplerian model 
given the partial HIRES data (case 3). 



Estimate 



B 



TPM 

Die 

AIC 

OBMH 

HM 



3.0x10" ±5.1 X lO"* 
2.8x10^ ±8.1 X 10"* 

1.2x10' 
1.4x10* ±3.3x10' 
4.3x10' ± 8.1 X 10' 



4.3. Case 3. Partial HIRES data 

As a third test, we calculate the different estimates for marginal 
integral given only 20 epochs of HIRES data - the first 20 epochs 
between 366 and 2602 JD-2450000 - to see their relative per- 
formance when the number of parameters is comparable to the 
number of measurements. We find that the TPM converges to the 
marginal integral very accurately with A - 10"^ for both models 
and yields very reliable estimates for these integrals. It is again 
very close to the DIC estimate, making it reliable because of the 
Gaussianity of the posterior density for both models and the con- 
sequent reliability of the DIC estimate. It is not surprising that 
the AIC overestimates the Bayes factor and therefore also the 
posterior odds of the one-Keplerian model because of the low 
number of data. However, the OBMH overestimates it as well as 
was in fact found to be the case in test cases 1 and 2 as well. 



5. Artificial data: effect of prior choice 

We demonstrate further the properties of the TPM estimate by 
comparing its performance to more traditional integral estima- 
tion techniques. We generated four sets of artificial radial veloc- 



Table 4. Bayes factors in favour of model Aii for data sets SI, 
S4 received using TPM estimate and the brute force (BP) 
approach for two priors, n\ and 7:2. 



Data 


TPM 


BF;ri 


BF;r2 


SI 


5.0x10" 


3.7x10" 


3.1x10' 


S2 


5.3x10' 


5.6x10' 


4.7x10^ 


S3 


1.2x10' 


71 


6.0x10"^ 


84 


35 


0.88 


7.5x10-" 



ity data and determined the number of Keplerian signals using 
the TPM estimate and an estimate received using brute force ap- 
proach, i.e. direct numerical integration of the product of like- 
lihood and prior over the parameter space. To demonstrate the 
conclusions in Section 3, we use an improper unit prior, i^. 
TTi - n{0) - 1, and a broad prior of iFord & Gregory! (i2007h 
with f „„■„ = Pi, denoted as ni, to show how they affect the con- 
clusions that can be drawn from the same data. 

The artificial data sets were generated by using 200 ran- 
dom epochs such that the first epoch was at f = and the 
ith one was selected randomly 1-10 days later within and in- 
terval of 7.2 hours, which simulates the fact that observations 
can only be made during the night. We generated the velocities 
by using a sinusoid with a period of 50 days and an amplitude 
of K and added Gaussian random noise with zero mean and a 
variance of 1 -H cr^ where cr,- describes the standard deviation 
of the artificial Gaussian instrument noise. The values cr, were 
drawn from a uniform density between 0.3 and 0.6 for every 
simulated measurement. We generated sets SI, S4 by using 
K = 1.0, 0.8, 0.6, 0.5, respectively. 

We show the model comparison results of the four artifi- 
cial data sets in Table |4] This Table contains the Bayes factors 
in favour of the model with one Keplerian signal and against 
a model with no signals at all. We show the estimates calcu- 
lated using a direct brute force numerical integration (BE) for 
the two priors {n\ and 712) and the TPM estimate (that has ap- 
proximately the same values for both priors, so we show only 
the results for n\). These Bayes factors show, that the Bartlett's 
paradox clearly pr events the detection (i.e. a Bay es factor in ex- 
cess of 150; Kass & Rafteryill995l:lTuomill20I2ij) of the periodic 
signals in data set S3, whereas the TPM estimate, that does not 
fall victim to this paradox, yields a positive detection. The signal 
in the set S4 is too weak for detection. 

It can be seen in Table|4]that the TPM estimate yields Bayes 
factors that support the existence of a signal in the data sets S 1 
- S3. In fact, the only data set where the signal could not be de- 
tected (S4), the Markov chains did not converge to a clear maxi- 
mum in the period space either but several small maxima out of 
which none could be said to be significantly more probable than 
the others. In all the rest, the chains converged to a clear maxi- 
mum corresponding to the periodic signals added to the artificial 
data sets. 

It can also be seen how the broader prior {712) changes the 
Bayes factors when estimating the marginal integrals by direct 
numerical integration. Relative to the unit prior (tti), the Bayes 
factors are roughly a factor of 10^ lower for JI2, and actually only 
provide a detection of the signal in data set S2 by exceeding the 
150 threshold only just. This shows that the jt2 corresponds to 
a priori model probabilities that are by a factor of 10^ more in 
favour of the model without Keplerian signals - clearly an unde- 
sirable side-effect of the priors of Eord & Gregory (2007). Yet, 
the TPM estimate, and the corresponding Bayes factors, turned 
out to have roughly the same values for both priors as suspected 
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because any constant terms in the prior do not affect the TPM 
estimate. Therefore, the TMP estimate enables the detection of 
weaker signals in the data than estimates that depend on con- 
stant coefficients in the prior density, and consequently, affect 
the prior probabilities of the models. 

6. Conclusions 

Calculating the marginal integral for model selection purposes is 
generally a challenging computational problem. While there are 
several good estimates for these integrals, they are usually only 
applicable under certain limiting assumptions about the nature 
of the posterior density, the amount of parameters in the statisti- 
cal model, or the number of measurements available. Therefore, 
we have introduced a new method for estimating these integrals 
in practice. Given the availability of a sample from the posterior 
density of model parameters, our truncated posterior mixture es- 
timate is a reasonably accurate one and very easily calculated in 
practice. We have only assumed that a statistically representative 
sample drawn from the posterior density exists when deriving 
our posterior mixture estimate (see Appendix). Therefore, it is 
applicable to any model comparison problems in astronomy and 
other fields of scientific inquiry and is not restricted to problems 
where the posterior has a certain shape and dimension. 

The comparisons of different estimates given the radial ve- 
locities of HD 365 1 revealed that the TPM yields estimates very 
close to the DIC estimate, which is known to be a reliable one 
in case of Gaussian posterior density. In fact, we chose the HD 
3651 as an example star because of the planet orbiting it is 
known to have an eccentric orbit that enables the Gaussianity of 
the probability distributions of eccentricity and the two angular 
parameters of the Keplerian model, namely, longitude of peri- 
centre and mean anomaly. However, the simple small-sample 
version of the AIC proved reasonably accurate as well when 
the number of measurements well exceeded the number of free 



for A that still converges to receive a trustworthy TPM estimate 
and correspondingly trustworthy model selection results in any 
model selection problem. 

Finally, because any constant coefficients in the prior prob- 
ability densities have an effect on the marginal integrals by cor- 
responding to different prior weights for different models, we 
have shown how the TPM estimate deals with this problem. 
Effectively, it corresponds to setting the constant coefficients in 
the prior equal to unity, which makes the TPM estimate indepen- 
dent of the unit choice of the parameters. 
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Appendix A: IVIarginal integrals from importance 
sampling 

In the context of Bayesian model selection, the marginal integral 
needed to assess the relative probabilities of different model is 



P(m\M) 



= r l(m\9, M)7T(9\M)d9, 

Jee& 



(A.l) 



where A1 is a model with parameter vector constructed to 
model the measurements m using the likelihood function I. 
Function n{6\M) is the prior probability density of the model 
parameters. This quantity is essential in calculating the posterior 
probabilities of different models in Eq. ([T]i. 

Importance sampling can be used to receive estimates for 
the integral in Eq. dA.lb . Choosing functions g and w such that 
7t{6) - w{6)g(6) and dropping the model from the notation, the 
marginal integral can be written using the expectation with re- 
spect to density g as 



parameters of the model (e.g. Table 



OBMH estimate of IChib & Jeliazkov 



2l). We also note that the ^ 

120011) . while converging Eg[w(6')/(m|6')] = I g(6)w(6)l(m\e)d6 = P(m). (A.2) 
=(] results that exasperate Jse& 



rapidly, tends to yield somewhat biased results that exaggerate 
the posterior probability of the more complicated model, mak- 
ing it possibly - at least in the test cases of the current work - 
prone to detections of false positives. 

In practice, the TMP can be used by calculating its value 
directly from the sample drawn from the posterior density of 
the model parameters. Selecting a suitable value for parameter 
A is then of essence when calculating its value in practice. In 
all the three different test cases studied in this article, a choice 
of A - 10"^ yielded estimates that converged rapidly for all the 
models in all the test cases and resulted in posterior probabili- 
ties that differed little from those calculated using the DIC es- 
timate. When the difference between the two models was the 
smallest (case 3.), there was practically no bias in the TMP es- 
timate with respect to the DIC. Also, when the posterior odds 
of the one-Keplerian model was the greatest (case 2.), the TPM, 
with A - 10""*, overestimated the posterior probability of the 
one-Keplerian model by a factor of 10, though, in that case, the 
Bayes factor used in model selection was already so heavily in 
favour of the one-Keplerian model that this overestimation is not 
significant in practice in terms of being able to select the best 
model. 

Because of the possible biases caused by too large A, it would 
then be convenient in practice to calculate the TPM estimate us- 
ing few different values of parameter A. With the sample from the 
posterior density available, this could be done with little compu- 
tational cost. Then, it would be possible to use the lowest value 



where g{6) is usually called the importance sampling function. 
Now, the idea of importance sampling is that if we draw a sample 
of values from g and denote 6, ~ g(6) for all / = 1, A^, we 
can calculate a simple estimate for the marginal integral as (e.g. 
lKass&Raftervl[T99l 



P = 



9i) 



y 7m 

2-1 



1-1 



(A.3) 



All there remains is to choose g such that it is easy to draw a sam- 
ple from it and that the estimate in Eq. ( IA.3I I converges rapidly 
to the marginal integral. 

Some simple choices of g would be the prior density or the 
posterior density. In these cases, the resulting estimates would 
be called the mean estimate and the harmonic mean e s timate , 
respectively (iNewton & Raftervl Il994t iKass & Raftervl Il995h . 
We denote these estimates as Pm and Phm and write 



1 ^ 

Pm = -^/(m|0,) 

/=1 



N 
and 

Phm^N 



l{m\ei) 



(A.4) 



(A.5) 
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Though easily computed in practice, these estimates have 
some undesirable properties. For instance, the mean estimate 
requires drawing a sample from the prior density and compu- 
tation of the corresponding likelihoods. However, because the 
prior contains less information and is therefore much broader 
density than the posterior, most of the values in this sample cor- 
respond to very low likelihoods and the convergence of this esti- 
mate is generally slow. The resulting value is also dominated by 
few high likelihoods, which can make it too biased to be useful 
in applications, except in very simple cases. 

Also, while converging to the desired value, the har- 
monic mean es t imate does it extremely slowly in practice 
(lKass&Raftervlll995h and its usage cannot be recommended. 
In applications, this estimate doesn't generally converge to the 
marginal integral within the limited sample available from the 
posterior. The reason is that occasional small values of l{m\9i) 
have a large impact on the sum which makes its convergence ex- 
tremely slow. For these reasons, better estimates are needed to 
approximate the marginal integrals in model selection problems. 



A.1. The posterior mixture estimate 

To construct a better estimate for the marginal integral, we start 
by assuming that a statistically representative sample has been 
drawn from the posterior density using some posterior sam- 
pling algorithm. Therefore, we have a collection of vectors 
6i ~ n{6\m), for all / = 1, A^. These values form a Markovian 
chain with members. Selecting integer h > 0, the value of the 
posterior 7r(6',_/,|m) is available if the value corresponding to 0, is 
available given i > h > 0. Here we can denote tt, = 7r(9i\m) and 
see that if 0, is a random vector then tt,- is some random num- 
ber corresponding to the value of the posterior at 0, . Using the 
notation similarly forg,, and setting /I e [0, 1], we can set 



gi ^(l- A)7ri + Ani^ 



h- 



(A.6) 



Now, if /I is a small number, it follows that gi ^ tt,- - 
the importance sampling function g is close to the posterior 
but not exactly equal. We call it a truncated posterior mixture 
(TPM) function. The sample from the posterior is close to a 
sample from g - a. desired property because a sample from the 
posterior can be calculated rather readily ^ith posterior sam- 
pling algorithms (e .g. iMetropolis et all Il953l : iHastingsi 119701: 
iHaario etaUl2001h . The estimate in Eq. ( IA.3I ) can now be cal- 
culated. We denote - l(m\9i) and pi = n(0i) and write the 
resulting posterior mixture estimate as 



TPM 



r N 



liPi 



(1 - A)lipi + Ali^hPi-h 
1 

Pi 
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y 

(1 - A)Upi + Ali-hPi-h 



(A.7) 



If the Markov chain has good mixing properties such that the 
value 0i has already become independent of the likelihoods 
of these values are also independent. When comparing this esti- 
mate with Phm in Eq- ( IA.5I ). it can be seen that occasional small 
values of Z, do not have such a large effect on the sum in the de- 
nominator because it is unlikely that the corresponding value of 

is also small at the same time. 
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